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Abstract 

We propose a new family of discrete energy minimization 
problems, which we call parsimonious labeling. Specifi¬ 
cally, our energy functional consists of unary potentials and 
high-order clique potentials. While the unary potentials are 
arbitrary, the clique potentials are proportional to the di¬ 
versity of set of the unique labels assigned to the clique. 
Intuitively, our energy functional encourages the labeling 
to be parsimonious, that is, use as few labels as possible. 
This in turn allows us to capture useful cues for impor¬ 
tant computer vision applications such as stereo correspon¬ 
dence and image denoising. Furthermore, we propose an 
efficient graph-cuts based algorithm for the parsimonious 
labeling problem that provides strong theoretical guaran¬ 
tees on the quality of the solution. Our algorithm consists 
of three steps. First, we approximate a given diversity using 
a mixture of a novel hierarchical Potts model. Second 
, we use a divide-and-conquer approach for each mixture 
component, where each subproblem is solved using an effi¬ 
cient a-expansion algorithm. This provides us with a small 
number of putative labelings, one for each mixture compo¬ 
nent. Third, we choose the best putative labeling in terms 
of the energy value. Using both sythetic and standard real 
datasets, we show that our algorithm significantly outper¬ 
forms other graph-cuts based approaches. 

1. Introduction 

The labeling problem provides an intuitive formulation 
for several problems in computer vision and related areas. 
Briefly, the labeling problem is deflned using a set of ran¬ 
dom variables, each of which can take a value from a flnite 
and discrete label set. The assignment of values to all the 
variables is referred to as a labeling. In order to quantatively 
distinguish between the large number of putative labelings, 
we are provided with an energy functional that maps a label¬ 
ing to a real number. The energy functional consists of two 
types of terms: (i) the unary potential, which depends on the 
label assigned to one random variable at a time; and (ii) the 
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clique potential, which depends on the labels assigned to a 
set of random variables. The goal of the labeling problem is 
to obtain the labeling that minimizes the energy. 

Perhaps a well-studied special case of the labeling prob¬ 
lem is the metric labeling problem [2, 12]. Here, the unary 
potentials are arbitrary. However, the clique potentials are 
specifled by a user-deflned metric distance function of the 
label space. Speciflcally, the clique potentials satisfy the 
following two properties: (i) each clique potential depends 
on two random variables; and (ii) the value of the clique 
potential (also referred to as the pairwise potential) is pro¬ 
portional to the metric distance between the labels assigned 
to the two random variables. Metric labeling has been used 
to formulate several problems in low-level computer vision, 
where the random variables correspond to image pixels. In 
such scenarios, it is natural to encourage two random vari¬ 
ables that correspond to two nearby pixels in the image to 
take similar labels. However, by restricting the size of the 
cliques to two, metric labeling fails to capture more infor¬ 
mative high-order cues. For example, it cannot encourage 
an arbitrary sized set of similar pixels (such as pixels that 
define a homogeneous superpixel) to take similar labels. 

We propose a natural generalization of the metric label¬ 
ing problem for high-order potentials, which we call par¬ 
simonious labeling. Similar to metric labeing, our energy 
functional consists of arbitrary unary potentials. However, 
the clique potentials can be deflned on any set of random 
variables, and their value depends on the set of unique la¬ 
bels assigned to the random variables in the clique. In more 
detail, the clique potential is deflned using the recently pro¬ 
posed notion of a diversity [4], which generalizes metric 
distance functions to all subsets of the label set. By mini¬ 
mizing the diversity, our energy functional encourages the 
labeling to be parsimonious, that is, use as few labels as 
possible. This in turn allows us to capture useful cues for 
important low-level computer vision applications. 

In order to be practically useful, we require an computa¬ 
tionally feasible solution for parsimonious labeling. To this 
end, we design a novel three step algorithm that uses an ef¬ 
ficient graph cuts based method as its key ingredient. The 
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first step of our algorithm approximates a given diversity as 
a mixture of a novel hierarchical Potts model (a gen¬ 
eralization of the Potts model [13]). The second step 
of our algorithm solves the labeling problem correspond¬ 
ing to each component of the mixture via a divide-and- 
conquer approach, where each subproblem is solved using 
Qf-expansion [25]. This provides us with a small set of puta¬ 
tive labelings, each corresponding to a mixture component. 
The third step of our algorithm simply chooses the putative 
labeling with the minimum energy. Using both sythetic and 
real datasets, we show that our overall approach provides 
accurate results for various computer vision applications. 

2. Related Work 

In last few years the research community have witnessed 
many successful applications of high-order random fields 
to solve many low level vision related problems such as dis¬ 
parity estimation, image restoration, and object segmenta¬ 
tion [7][8][10] [14][18][19][24][26][27]. In this work, our 
focus is on methods that (i) rely on efficient move-making 
algorithms based on graph cuts; (ii) provide a theoretical 
guarantee on the quality of the solution. Below, we discuss 
the work most closely related to ours in more detail. 

Kohli et al. [13] proposed the P^ Potts model, which 
enforces label consistency over a set of random variables. 
In [14], they presented a robust version of the P^ Potts 
model that takes into account the number of random vari¬ 
ables that have been assigned an inconsistent label. Both 
the P^ Potts model and its robust version lend themselves 
to the efficient a—expansion algorithm [13, 14]. Further¬ 
more, the Of—expansion algorithm also provides a multi¬ 
plicative bound on the energy of the estimated labeling with 
respect to the optimal labeling. While the robust P^ Potts 
model has been shown to be very useful for semantic seg¬ 
mentation, our generalization of the P^ Potts model offers 
a natural extension of the metric labeling problem and is 
therefore more widely applicable to several low-level com¬ 
puter vision applications. Delong et al. [7] propose a global 
clique potential that is based on the cost of using a label or 
a subset of labels in the labeling of the random variables. 
Similar to the P^ Potts model, the label cost based poten¬ 
tial can also be minimized using a—expansion. However, 
the theoretical guarantee provided by a—expansion is an 
additive bound, which is not invariant to reparameterization 
of the energy function. Delong et al. [ 6 ] also proposed an 
extension of their work to hierarchical costs. However, the 
assumption of a given hierarchy over the label set limits its 
application in practice. 

Independently, Ladicky et al [18] proposed a global co¬ 
occurrence cost based high order model for a much wider 
class of energies that encourage the use of a small set of la¬ 
bels in the estimated labeling. Theoretically, the only con¬ 
straint that [18] enforces in high order clique potential is 


that it should be monotonic in the label set. In other words, 
the problem addressed in [18] can be regarded as a gener¬ 
alization of parsimonious labeling. However, they approxi¬ 
mately optimize an upperbound on the actual energy func¬ 
tional which does not provide any optimality guarantees. 
In our experiments, we demonstrate that our move-making 
algorithm significantly outperforms their approach for the 
special case of parsimonious labeling. 

3. Preliminaries 

The labeling problem. Consider a random field defined 
over a set of random variables x = ar¬ 
ranged in a predefined lattice V = Each 

random variable can take a value from a discrete label set 

= {/i, • • • ,Ih}- Furthermore, let C denote the set of 
maximal cliques. Each maximal clique consists of a set of 
random variables that are all connected to each other in the 
lattice. A labeling is defined as the assignment or mapping 
of random variables to the labels. To assess the quality of 
each labeling x we define an energy functional as: 

W = X] + X] ^c(Xc) (1) 
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where Oi{xi) is any arbitrary unary potential of assign¬ 
ing a label Xi to the random variable i, and Oc{^c) is a 
clique potential for assigning the labels Xc to the variables 
in the clique c. We assume that the clique potentials are 
non-negative. As will be seen shortly, this assumption is 
satisfied by the new family of energy functionals proposed 
in our paper. The total number of putative labelings is , 
each of which can be assessed using its corresponding en¬ 
ergy value. Within this setting, the labeling problem is to 
find the labeling that corresponds to the minimum energy 
according to the functional (1). Formally, the labeling prob¬ 
lem can be defined as: x* = argmin,^ ^(x)- 


P^ Potts model. An important special case of the label¬ 
ing problem, which will be used throughout this paper, is 
defined by the P^ Potts model [13]. The P^ Potts model 
is a generalization of of the well known Potts model [20] 
for high-order energy functions (when cliques can be of ar¬ 
bitrary sizes). For a given clique, the P^ Potts model is 
defined as: 


ifxi = lk,^iec 
^max ^ otherwise 


( 2 ) 


where 7 ^ is the cost of assigning all the nodes to label 
Ik G jC, and 7 ^^^ > jk^'^h G jC. Intuitively, the P^ Potts 
model enforces label consistency by assigning the cost of 
^max one labels are present in the given clique. 
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a-expansion for Potts model. In order to solve the la¬ 
beling problem corresponding to the Potts model, Kohli 
etal [13] proposed to use the a—expansion algorithm [25]. 
The a—expansion algorithm starts with an initial labeling, 
for example, by assigning each random variable to the label 
/i. At each iteration, the algorithm moves to a new labeling 
by searching over a large move space. Here, the move space 
is defined as the set of labelings where each random variable 
is either assigned its current label or the label a. The key 
result that makes a—expansion a computationally feasible 
algorithm for the P^ Potts model is that the minimum en¬ 
ergy labeling within a move-space can be obtained using 
a single minimum st-cut operation on a graph that consists 
of a small number (linear in the size of the variables and 
the cliques) of vertices and arcs. The algorithm terminates 
when the energy cannot be reduced further for any choice 
of the label a. We refer the reader to [13] for further details. 

Multiplicative Bound. The labeling problem, and many 
of its special cases including the one defined by the P^ Potts 
model, is known to be NP-hard. However, due to its prac¬ 
tical importance, many approximate algorithms have been 
proposed in the literature (for example, the aforementioned 
a—expansion algorithm for the P^ Potts model). An in¬ 
tuitive and commonly used measure of the accuracy of an 
approximation algorithm is the multiplicative bound. For¬ 
mally, the multiplicative bound of a given algorithm is said 
to be B if the following condition is satisfied for all possible 
values of unary potential and clique potentials 6>c(xc): 

+y]^'c(xc) < (3) 

iev cec iev cec 

Here, x is the labeling estimated by the algorithm and x* 
is a globally optimal labeling. By definition of an optimal 
labeling (one that has the minimum energy), the multiplica¬ 
tive bound will always be greater than or equal to one [16]. 

Multiplicative Bound for the a-expansion algorithm for 
the P^ Potts model. Using the a—expansion algorithm 
for the P^ potts model we obtain the multiplicative bound 
of A min(Al, \C\), where, Ai is the size of the largest max¬ 
imal clique in the graph, |>C| is the number of labels, and A 
is defined as below [11]: 
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min 


= min 7 /, 
kec 


A = 


^ Q 

Otherwise 


(4) 


4. Parsimonious Labeling 

The parsimonious labeling problem is defined using 
an energy functional that consists of unary potentials and 
clique potentials defined over cliques of arbitrary sizes. 


While the parsimonious labeling problem places no restric¬ 
tions on the unary potentials, the clique potentials are spec¬ 
ified using a diversity function [4]. Before describing the 
parsimonious labeling problem in detail, we briefiy define 
the diversity function for the sake of completion. 

Definition 1. A diversity is a pair (£, (5), where C is the set 
of labels and S is a non-negative function defined on finite 
subsets of C, 5 : T ^ M, VT C C, satisfying following 
properties: 

• Non Negativity: J(r) > 0, and d{V) = 0, |r| < 1. 

• Triangular Inequality: ifT 2 7^0, (5(riUr2)+(^(r2U 
r3)>(^(riur3),vri,r2,r3c/:. 

• Monotonicity: Ti C r 2 implies (^(Ti) < S{r 2 ) 

Using a diversity function, we can define a clique poten¬ 
tial as follows. We denote by r(xc) the set of unique labels 
in the labeling of the clique c. Then, Od^c) = ^c^(r(xc)), 
where 6 is a. diversity function and Wc is the non-negative 
weight corresponding to the clique c. Formally, the parsi¬ 
monious labeling problem amounts to minimizing the fol¬ 
lowing energy functional: 


E{x) = + 'P'^cS{T{Xc)) (5) 
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Therefore, given a clique Xc and the set of unique la¬ 
bels r(xc) assigned to the random variables in the clique, 
the clique potential function for the parsimonious labeling 
problem is defined using J(r(xc)), where 6 : r(xc) -> M 
is a diversity function. 

Intuitively, diversities enforces parsimony by choosing 
a solution with less number of unique labels from a set of 
equally likely solutions, which makes it highly interesting 
for the computer vision community. This is an essential 
property in many vision problems, for example, in case of 
image segmentation, we would like to see label consistency 
within superpixels in order to preserve discontinuity. Unlike 
the P^ Potts model the diversity does not enforce the label 
consistency very rigidly. It gives monotonic rise to the cost 
based on the number of labels assigned to the given clique. 

An important special case of the parsimonious labeling 
problem is the metric labeling problem, which has been 
extensively studied in computer vision [2] and theoretical 
computer science [12]. In metric labeling, the maximal 
cliques are of size two (pairwise) and the clique potential 
function is a metric distance function defined over the la¬ 
bels. Recall that a distance function d:£x£^]Risa 
metric if and only if: (i) d {.^.) > 0; (ii) d{id) + d{j^ k) > 
d{i, k; and (iii) d{id) = 0 if and only if i = j. 

Notice that, there is a direct link between the metric dis¬ 
tance function and the diversities. The diversities can be 
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seen as the metric distance function over the sets of arbi¬ 
trary sizes. In another words, diversities are the general¬ 
ization of the metric distance function and boil down to a 
metric distance function if the input set is restricted to the 
subsets with cardinality of at most two. Another way of 
understanding the connection between metrics and diversi¬ 
ties is that every diversity induces a metric. In other words, 
consider /i) = 5{li) andd(/i,/j) = Using 

the properties of diversities, it can be shown that d(', •) is 
a metric distance function. Hence, in case of energy func¬ 
tional defined over pairwise cliques, the parsimonious la¬ 
beling problem reduces to the metric labeling problem. 

In the remaining part of this section we talk about a spe¬ 
cific type of diversity called the diameter diversity, show its 
relation with the well known Potts model, and propose 
a hierarchical Potts model based on the diameter diver¬ 
sity defined over a hierarchical clustering (defined shortly). 
However, note that our approach is applicable to any gen¬ 
eral parsimonious labeling problem. 

Diameter diversity. Among many known diversities 
([3]), in this work, we are primarily interested in the di¬ 
ameter diversity. Let (C^S) be a diversity and (£, d) be 
the induced metric of {C,6), where d : C x C ^ R and 
d{li,lj) = 6{{li,lj}),yii,lj G £, then for all L C £, the 
diameter diversity is defined as: 

( 6 ) 

Clearly, given the induced metric function defined over 
a set of labels, diameter diversity over any subset of labels 
gives the measure of how dissimilar (or diverse) the labels 
are. More the dissimilarity, based on the induced metric 
function, higher is the diameter diversity. Therefore, using 
diameter diversity as clique potentials enforces the similar 
labels to be together. Thus, a special case of parsimonious 
labeling in which the clique potentials are of the form of 
diameter diversity can be defined as below: 

E{^) = Y, diixi) + Y «^c(5‘^“(r(x,)) (7) 

iev cec 

Notice that the diameter diversity defined over uniform 
metric is nothing but the P^ Potts model where 7 ^ = 0. 
In what follows we define a generalization of the P^ Potts 
model, the hierarchical P^ Potts model, which will play a 
key role in the rest of the paper. 

The Hierarchical P^ Potts Model. The hierarchical P^ 
Potts model is a diameter diversity defined over a spe¬ 
cial type of metric known as the r-HST metric. A rooted 



Figure 1: An example 6>/r-HST/6>r r = 2. The cluster 
associated with root p contains all the labels. As we go 
down, the cluster splits into subclusters and finally we get 
the singletons, the leaf nodes (labels). The root is at depth 
of d = 1 and leaf nodes at d = 3. The metric defined over 
the r-HST is denoted as d ^{.,.), the shortest path between 
the inputs. For example, If) = 18 and d^{li^ I 2 ) = 6 . 
The diameter diversity/( 9 r the subset of labels at cluster p is 

= 18 . 

tree, as shown in figure (1), is said to be an r-HST, or r- 
hierarchically well separated [ 1 ] if it satisfy the following 
properties: (i) all the leaf nodes are the labels; (ii) all edge 
weights are positive; (iii) the edge lengths from any node 
to all of its children are the same; and (iv) on any root to 
leaf path the edge weight decrease by a factor of at least 
r > 1. We can think of a r-HST as a hierarchical clustering 
of the given label set C. The root node represents the cluster 
at the top level of the hierarchy and contains all the labels. 
As we go down in the hierarchy, the clusters breaks down 
into smaller clusters until we get as many leaf nodes as the 
number of labels in the given label set. The metric distance 
function defined on this tree d^(.) is known as the r-HST 
metric. In other words, the distance d^(', •) between any 
two nodes in the given r-HST is the shortest path distance 
between these nodes in the tree. The diameter diversity de¬ 
fined over (i^(.,.) is called the hierarchical P^ Potts model. 
The example of a diameter diversity defined over an r-HST 
is given in the figure ( 1 ). 

5. The Hierarchical Move Making Algorithm 

In the first part of this section we propose a move mak¬ 
ing algorithm for the hierarchical P^ Potts model (defined 
in the previous section). In the second part, we show how 
our hierarchical move making algorithm can be used to min¬ 
imize the much more general parsimonious labeling prob¬ 
lem with optimality guarantees (tight multiplicative bound). 

5.1. The Hierarchical Move Making Algorithm for 
the Hierarchical Potts Model 

In Hierarchical P^ Potts model the clique potentials are 
of the form of the diameter diversity defined over a given r- 
HST metric function. The move making algorithm proposed 
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Algorithm 1 The Move Making Algorithm for the Hierar¬ 
chical Potts Model. 


input r-HST Metric, Wc.Mc G C, and Oi{xi),\fi G V 
1: d = I), the leaf nodes 

2: repeat 

3: for each p G N'{d) do 

4: if 1 7^(p) I = 0, leaf node then 

5: = p, Vi G V 

6: else 

7: Fusion Move 


tP = argmin E{t^) (8) 

tpe{i,■■■ Mp)\}^ 


8 

9 

10 

11 

12 




end if 
end for 
d ^ d — 1 

until d > 1. 


Pi 



Figure 2: An example of solving the labeling problem at 
non-leaf node (p) by combining the solutions of its child 
nodes {pi , p 2 }, given clique c and the labelings that it has 
obtained at the child nodes. Note that the hierarchical clus¬ 
tering shown in this figure is the top two levels of the r-HST 
shown in the figure (1), for a given clique c. The diameter 
diversity of the labeling of clique c at node pi is 0 as it con¬ 
tains only one unique label li. The diameter diversity of the 
labeling at p 2 is d^{ls, U) = 6 and the label set at p is 18. 


in this section to minimize such an energy functional is a 
divide-and-conquer based approach, inspired by the work 
of [17]. Instead of solving the actual problem, we divide the 
problem into smaller subproblems where each subproblem 
amounts to solving a—expansion for the Potts model 
[13]. More precisely, given an r-HST, each node of the r- 
HST corresponds to a subproblem. We start with the bottom 
node of the r-HST, which is a leaf node, and go up in the hi¬ 
erarchy solving each subproblem associated with the nodes 
encountered. 

In more detail, consider a node p of the given r-HST. Re¬ 
call that any node p in the r-HST represents a cluster of la¬ 
bels denoted sls C C (figure 1). In another words, the 
leaf nodes of the subtree rooted at p belongs to the C^. Thus, 
the subproblem defined at node p is to find the labeling 
where the label set is restricted to as defined below. 




= argmin 
xecp 
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(9) 


If p is the root node, then the above problem (equa¬ 
tion 9) is as difficult as the original labeling problem (since 
= C). However, if p is the leaf node then the solution of 
the problem associated with p is trivial, x^ = p for all i G V, 
which means, assign the label p to all the random variables. 
This insight leads to the design of our approximation algo¬ 
rithm, where we start by solving the simple problems cor¬ 
responding to the leaf nodes, and use the labelings obtained 
to address the more difficult problem further up the hierar¬ 
chy. In what follows, we describe how the labeling of the 
problem associated with the node p, when p is not the leaf 
node, is obtained using the labelings of its chidren node. 


Solving the Parent Labeling Problem Before delving 
into the details, let us define some notations for the pur¬ 
pose of clarity. Let D be the depth (or the number of levels) 
in the given r-HST. The root node being at the top level, 
depth of one. Let r]{p) denotes the set of child nodes as¬ 
sociated with a non-leaf node p and r]{p, k) denotes its 
child node. Recall that our approach is bottom up, there¬ 
fore, for each child node of p we already have a labeling 
associated with them. We denote the labeling associated 
with the k^^ child of the node p as Thus, 

denotes the label assigned to the random variable by the 
labeling of the child of the node p. We also define an 
N dimensional vector t^, where each index of the the vec¬ 
tor can take a value from the set denoting the child indices 
of node p, {1, • • • , |7^(p)|}, where |7^(p)| denotes the num¬ 
ber of child nodes of p. More precisely, = k denotes 
that the label for the random variable comes from the 
kth Qf node p. Therefore, the labeling problem at 
node p reduces to finding the optimal t^. Thus, the labeling 
problem at node p amounts to finding the best child index 
k G {1, • • • 5 l7(p)|} for oach random variable i G V so that 
the label assigned to the random variable comes from the 
labeling of the k^^ child. 

Using the above notations, associated with a we define 
a new energy functional as: 


E{tn = + E do) 
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where 


if f.=k (11) 
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which says that the unary potential for = k is the 
unary potential associated to the random variable corre¬ 
sponding to the label 

The new clique potential ^c(t?) is as defined below: 

W)=l* ^' (12) 

\l?riax^ Otherwise 

where 7 ^ = is the diameter diversity of 

the set of unique labels associated with and — 

^dia{ky) is the diameter diversity of the set of labels asso¬ 
ciated with the cluster at node p. Recall that, because of the 
construction of the r-HST, jC^ C jC^ for all q G r]{p). Hence, 
the monotonicity property of the diameter diversity ensures 
that "f'^ax > ^ 7(p)- This is the sufficient criterion 

to prove that the potential function defined by equation ( 12 ) 
is a Potts model. Therefore, the a—expansion algorithm 
can be used to obtain the locally optimal for the energy 
functional (10). Once we have obtained the locally optimal 
t^, the labeling at node p can be trivially obtained as fol¬ 
lows: \ which says that the final label of the 

j^th j^andom variable is the one assigned to it corresponding 
to the labeling of the child of the node p. 

Figure (2) shows an instance of the above mentioned al¬ 
gorithm to combine the labelings of the child nodes to ob¬ 
tain the labeling of the parent node. The complete hierar¬ 
chical move making algorithm for the hierarchical Potts 
model is shown in the Algorithm- 1. 

Multiplicative Bound. Theorem- 1 gives the multiplica¬ 
tive bound for the Move Making Algorithm for the Hierar¬ 
chical P^ Potts model. 

Theorem 1. The move making algorithm for the hierarchi¬ 
cal P^ Potts model, Algorithm- 1, gives the multiplicative 
bound of min(Al, |£|) with respect to the global 

minima. Here, AA is the size of the largest maximal-clique 
and \C\ is the number of labels. 

Proof: Given in Appendix. 

5.2. The Move Making Algorithm for the Parsimo¬ 
nious Labeling 

In the previous subsection, we proposed a hierarchi¬ 
cal move making algorithm for the hierarchical P^ Potts 
model. This restricted us to a very limited class of clique 
potentials. In this section we generalize our approach to the 
much more general parsimonious labeling problem. 

The move making algorithm for the parsimonious la¬ 
beling problem is shown in the Algorithm-(2). Given a 
diversity based clique potentials, clique weights, and the 
unary potentials, the Algorithm- (2) approximates the diver¬ 
sity into a mixture of hierarchical P^ Potts models and then 


Algorithm 2 The Move Making Algorithm for the Parsimo¬ 
nious Labeling Problem. 

input Diversity Wc^c G C; Oi{xi),^i GV;C;k 

1 : Approximate the given diversity as the mixture of k hi¬ 
erarchical P^ Potts model using Algorithm-3. 

2 : for each hierarchical P^ Potts model in the mixture do 
3: Use the hierarchical move making algorithm defined 

in the Algorithm- 1. 

4: Compute energy corresponding to the solution ob¬ 

tained. 

5: end for 

6 : Choose the solution with the minimum energy. 


Algorithm 3 Diversity to Mixture of Hierarchical P^ Potts 
model._ 

input Diversity {jC,6),k 

1 : Compute the induced metric, d{.), where d{li,lj) = 

2: Approximate d{.) into mixture of k r-HST metrics d^{.) 

using the algorithm proposed in [9]. 

3: for each r-HST metrics d^{.) do 

4: Obtain the corresponding Hierarchical P^ Potts 

model by defining the diameter diversity over d^{.) 

5: end for 


use the previously defined hierarchical move making algo¬ 
rithm on each of the hierarchical P^ Potts models. 

The algorithm for approximating a given diversity into 
a mixture of hierarchical P^ Potts models is shown 
in Algorithm-(3). The first and the third steps of the 
Algorithm-(3) have already been discussed in the previous 
sections. The second step, which amounts to finding the 
mixture of r-HST metrics for a given metric, can be solved 
using the randomized algorithm proposed in [9]. We refer 
the reader to [9] for further details of the algorithm for ap¬ 
proximating a metric using a mixture of r-HST metrics. 

Multiplicative Bound Therorem-2 gives the multiplica¬ 
tive bound for the parsimonious labeling labeling problem, 
when the clique potentials are any general diversity. 

Theorem 2. The move making algorithm defined in 
Algorithm-2 gives the multiplicative bound < 2 / {\^\~ 

l)(log |>C|) min(Al, |>C|) for the parsimonious labeling 
problem (equation 5 ). Here, A4 is the size of the largest 
maximal-clique and \C\ is the number of labels. 

Proof: Given in the Appendix. 

6. Experiments 

We demonstrate the utility of the parsimonious labeling 
on both synthetic and real data. In case of synthetic data. 
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we perform significant number of random experiments on 
big grid lattices and evaluate our method based on the en¬ 
ergy and the time taken. To evaluate the modeling capabil¬ 
ities of the parsimonious labeling, we used it on two chal¬ 
lenging real problems: (i) stereo matching, and (ii) image 
inpainting. We use co-occurrence statistics based energy 
functional proposed by Ladicky et al [18] as our baseline. 
Theoretically, the only constraint that [18] enforces on the 
clique potentials is that they must be monotonic in the la¬ 
bel set. Therefore, can be regarded as the generalization 
of the parsimonious labeling. However, based on the syn¬ 
thetic and the real data results, supported by the theoretical 
guarantees, we show that the parsimonious labeling and the 
move making algorithm proposed in this work outperforms 
the more general work proposed in [18]. 

Recall that the energy functional of the parsimonious la¬ 
beling problem is defined as: 

E{x.) = '^0i{xi)+ '^Wc5{T{xc)) (13) 

iev cec 

In our experiments, we frequently use the truncated lin¬ 
ear metric. We define it below for the sake of completeness. 

Oi,j{la,lb) = Amin(|(a - lb\,M),\/la,lb e c.. (14) 

where A is the weight associated with the metric and M is 
the truncation constant. 

6.1. Synthetic Data 

We consider following two cases: (i) when the hierarchi¬ 
cal Potts model is given, and (ii) when a general diver¬ 
sity is given. In each of the two cases, we generate lattices 
of size 100 X 100, 20 labels, and use A = 1. The cliques 
are generated using a window of size 10 x 10 in a sliding 
window fashion. The unary potentials were randomly sam¬ 
pled from the uniform distribution defined over the interval 
[0,100]. In the first case, we randomly generated 100 lat¬ 
tices and random r-HST trees associated with each lattice, 
ensuring that they satisfy the properties of the r-HST. Each 
r-HST was then converted into hierarchical Potts model 
by taking diameter diversity over each of them. This hierar¬ 
chical P^ Potts model was then used as the actual clique po¬ 
tential. We performed 100 such experiments. On the other 
hand, in the second case, for a given value of the truncation 
M, we generated a truncated linear metric and 100 lattices. 
We treated this metric as the induced metric of a diame¬ 
ter diversity and generated mixture of hierarchical P^ Potts 
model using Algorithm-3. Applied Algorithm- 1 for the en¬ 
ergy minimization over each hierarchical P^ Potts model 
in the mixture and chose the one with the minimum energy. 
Notice that, in this case, the actual potential is the given 
diversity, not the generated hierarchical P^ Potts models. 


Thus, the co-occurrence [18] was given the actual diversity 
as the clique potentials. The method was evaluated using 
the given diversity as the clique potentials. We used four 
different values of the truncation factor M G {1,5,10, 20}. 
For both the experiments, we used 7 different values of Wc’. 
Wc e {0,1,2,3,4,5,100}. 

The average energy and the time taken for both the meth¬ 
ods and both the cases are shown in the figure (3). It 
is evident from the figures that our method outperforms 
co-occurrence [18] in both the cases, in term of time and 
the energy. In case the hierarchical P^ Potts model is 
given, case (i), our method performs much better than 
co-occurrence [18] because of the fact that it is directly 
minimizing the given potential. In case (ii), despite the 
fact that our method first approximates the given diversity 
into mixture of hierarchical P^ Potts, it outperforms co¬ 
occurrence [18]. This can be best supported by the fact that 
our algorithm has very tight multiplicative bound. 

6.2. Real Data 

In case of real data, the high-order cliques we used are 
the superpixels obtained using the mean-shift method [5]. 
The clique potentials used for the experiments are the diam¬ 
eter diversity of the truncated linear metric (equation (14)). 
A truncated linear metric enforces smoothness in the pair¬ 
wise setting, therefore, the diameter diversity of the trun¬ 
cated linear metric will naturally enforce smoothness in the 
high-order cliques, which is a desired cue for the two appli¬ 
cations we are dealing with. In both the real experiments we 
used the following form of Wc (for the high order cliques): 

p(xc) 

Wc = exp“ , where p(xc) is the variance of the intensi¬ 
ties of the pixels in the clique Xc and a is a hyperparameter. 

6.2.1 Stereo Matching 

Given two rectified stereo pair of images, the problem of 
stereo matching is to find the disparity (gives the notion of 
depth) of each pixel in the reference image [23, 22]. In this 
work, we extended the standard setting of the stereo match¬ 
ing [22] to high-order cliques and tested our method to the 
images, ‘tsukuba’ and ‘teddy’, from the widely used Mid- 
dlebury stereo data set [21]. The unaries were computed 
as the I/l—norm of the difference in the RGB values of the 
left and the right image pixels. Notice that the index for the 
right image pixel is the index for the left image pixel mi¬ 
nus the disparity, which is the label. In case of ‘teddy’ the 
unaries were trucated at 16. The weights Wc for the pair¬ 
wise cliques are set to be proportional to the LI—norm of 
the gradient A of the intensities of the neighbouring pixels. 
In case of ‘tsukuba’, if A < 8, rCc = 2, otherwise Wc = 1. 
In case of ‘teddy’, if A < 10, Wc = 3, otherwise Wc = 1. 
As mentioned earlier, Wc for the high-order cliques is set 
to be proportional to the variance. We used different val- 
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Figure 3: Synthetic (Blue: Our, Red: Co-occ [18]). The x-axis for all the figures is the weight associated with the cliques 
(wc). Figures (a) and (b) are the plots for the energy and the time when the hierarchical Potts model was assumed to be 
known. Figures (c) and (d) are the energy and the time plots for the case when a diversity (diameter diversity over truncated 
linear metric) was given as the clique potentials. Notice that in both the cases our method outperforms the baseline [18] 
both in terms of energy and time. Also, for very high value ofwc = 100, both the methods converges to the same labeling. 
This is expected as a very high value ofWc enforces rigid smoothness by assigning everything to the same label. 



(a) Tsukuba (b) Our (c) Co-occ [18] (d) Teddy (e) Our (f) Co-occ [18] 

(Energy, Time) (1195800,167) (2202500,95) (Energy, Time) (1511206,287) (1519500,605) 


Figure 4: Stereo Matching Results. Figures (a) and (d) are the ground truth disparity for the dsukuba’ and 'teddy' respec¬ 
tively. Notice that our method outperforms the baseline Co-ooc [18] in both the cases in terms of energy. From figure (b) and 
(e), we can clearly see the effect of 'parsimonious labeling' as the regions are smooth and the discontinuity is preserved. 



(a) Penguin 
(Energy, Time) 


(b) Our 

(12516336,156) 


(c) Co-oc [18] 
(14711806,110) 


(d) House 
(Energy, Time) 


(e) Our 

(32799162,1014) 


(f) Co-oc [18] 
(38597848,367) 


Figure 5: Image inpainting results. Figures (a) and (d) are the input images of 'penguin' and 'house' with added noise and 
obscured regions. Our method, (b) and (e), outperforms the baseline Co-ooc [18] in both the cases in terms of energy. Figure 
(b) clearly shows the effect of 'parsimonious labeling' as the regions are smooth and the discontinuity is preserved. 


ues of cr, A, and the truncation M. Because of the space 
constraints we are showing results for the following setting: 
for ‘tsukuba’, A = 20, a = 100 and M = 10; for ‘teddy’, 
A = 10, cr = 1000 and M = 1. Figure (4) shows the results 
obtained. Notice that our method significantly outperforms 
the co-occurrence [18] based method in terms of energy for 
both, ‘tsukuba’ and ‘teddy’. We show similar promising re¬ 
sults for different parameters in the Appendix. 


6.2.2 Image Inpainting and Denoising 

Given an image with added noise and obscured regions (re¬ 
gions with missing pixels), the problem is to denoise the 
image and fill the obscured regions such that it is consis¬ 
tent with the surroundings. We performed this experiment 
on the images, ‘penguin’ and ‘house’, from the widely used 
Middlebury data set. The images under consideration are 
gray scale, therefore, there are 256 labels in the interval 
[0, 255], each representing an intensity value. The unaries 
for each pixel (or node) corresponding to a particular label, 
is the squared difference between the label and the intensity 
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value at that pixel. The weights Wc for the pairwise cliques 
are all set to one. For the high-order cliques, as mentioned 
earlier, Wc are chosen to be proportional to the variance of 
the intensity of the participating pixels. We used differ¬ 
ent values of a. A, and the truncation M. Because of the 
space constraints we are showing results for the following 
setting: ‘penguin’, the A = 40, a = 10000 and M = 40; for 
‘house’, the A = 30, cr = 10 and M = 40. Figure 5 shows 
the results obtained. Notice that our method significantly 
outperforms the co-occurrence based method [18] in terms 
of energy for both, ‘penguin’ and ‘house’. We show similar 
promising results for different parameters in the Appendix. 

7. Discussion 

We proposed a new family of discrete optimization par¬ 
simonious labeling, a novel hierarchical Potts model, 
and move making algorithms to minimize energy functional 
for them. We gave very tight multiplicative bounds for the 
move making algorithms, applicable to all the ‘diversities’. 
An interesting direction for future research would be to ex¬ 
plore different ‘diversities’ and propose algorithms specific 
to them with better bounds. Another interesting future work 
would be to directly approximate ‘diversities’ into mixture 
of hierarchical Potts model, without using the interme¬ 
diate r-HST. 
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A. Additional Real Data Experiments and Analysis 

Recall that the energy functional of the parsimonious labeling problem is defined as: 

E{x.) = ^Oi{xi) + ^u;c^(r(xc)) (15) 

iev cec 

where ^() is the diversity function defined over the set of unique labels present in the clique Xc. In our experiments, we 
frequently use the truncated linear metric. We define it below for the sake of completeness. 

Oi,j{la,lb) = Amin(|/a - /bl, M), V/a, ^6 ^ >c. (16) 

where A is the weight associated with the metric and M is the truncation constant. 

In case of real data, the high-order cliques are defined over the superpixels obtained using the mean-shift method [5]. The 
clique potentials used for the experiments are the diameter diversity of the truncated linear metric. A truncated linear metric 
(equation (16)) enforces smoothness in the pairwise setting, therefore, the diameter diversity of the truncated linear metric 
will naturally enforce smoothness in the high-order cliques, which is a desired cue for the two applications we are dealing 
with. 

p(Xc ) 

In all the real experiments we use the following form of Wc (for the high order cliques): Wc = exp ^, where p(xc) is 
the variance of the intensities of the pixels in the clique Xc and a is a hyperparameter. 

In order to show the modeling capabilities of the parsimonious labeling we compare our results with the well known 
Of—expansion [25], TRWS [15], and the Co-occ [18]. We also show the effect of clique sizes, which in our case are the 
superpixels obtained using the mean-shift algorithm, and the parameter Wc associated with the cliques, for the purpose of 
understanding the behaviour of the parsimonious labeling. 

A.l. Stereo Matching 

Please refer to the paper for the description of the stereo matching problem. Figures (6) and (7) shows the comparisons 
between different methods for the Teddy’ and ‘tsukuba’ examples, respectively. It can be clearly seen that the parsimonious 
labeling gives better results compared to all the other three methods. The parameter Wc can be thought of as the trade off 
between the infiuence of the pairwise and the high order cliques. Finding the best setting of Wc is very important. The effect 
of the parameter Wc, which is done by changing a, is shown in the figure (8). Similarly, the cliques have great impact on the 
overall result. Large cliques and high value of Wc will result in over smoothing. In order to visualize this, we show the effect 
of clique size in the figure (9). 



(a) Gnd Truth (b) a— exp (c) TRWS (d) Co-occ (e) Our Method 

Figure 6: Comparison of all the methods for the stereo matching of 'teddy \ We used the optimal setting of the parameters 
proposed in the well known Middlebury webpage and [22]. The above results are obtained using a = 10^ for the Co-occ 
and our method. Clearly, our method gives much smooth results while keeping the underlying shape intact. This is because 
of the cliques and the corresponding potentials (diversities) used. The diversities enforces smoothness over the cliques while 
a controls this smoothness in order to avoid over smooth results. 


A.l. Image Inpainting and Denoising 

Please refer to the paper for the description of the image inpainting and the denoising problem. Figures (10) and (11) 
shows the comparisons between the different methods for the ‘penguin’ and the ‘house’ examples, respectively. It can be 
clearly seen that the parsimonious labeling gives highly promising results compared to all the other methods. 
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(a) Gnd Truth (b) a— exp (c) TRWS (d) Co-occ (e) Our Method 


Figure 7: Comparison of all the methods for the stereo matching of dsukuba\ We used the optimal setting of the parameters 
proposed in the well known Middlebury webpage and [22]. The above results are obtained using a = 10^ for the Co-occ 
and our method. We can see that the disparity obtained using our method is closest to the ground truth compared to all other 
methods. In our method, the background is uniform (under the table also), the camera shape is closest to the ground truth 
camera, and the face disparity is also closest to the ground truth compared to other methods. 




Figure 8: Effect of a in the parsimonious labeling. All the parameters are same except for the cr. Note that as we increase the 
(7, the Wc increases, which in turn results in over smoothing. 



Figure 9: Effect of clique size (superpixels). The top row shows the cliques (superpixels) used and the bottom row shows 
the stereo matching using these cliques. As we go from left to right, the minimum number of pixels that a superpixel must 
contain increases. All the other parameters are the same. In order to increase the weight Wc, we use high value of a, which 
is cr = 10^ in all the above cases. 


B. Proof of Theorems 

The labeling problem. As already defined in the paper, consider a random field defined over a set of random variables 
X = {xi , • • • , xat} arranged in a predefined lattice V = {1, • • • , A^}. Each random variable can take a value from a discrete 
label set £ = {/i, • • • ,///}. The energy functional corresponding to a labeling x is defined as: 


E{yL) = + X] ^c(Xc) (17) 

iev cec 

where 0i{xi) is any arbitrary unary potential, and Ocf^c) is a clique potential for assigning the labels Xc to the variables in 
the clique c. 
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(a) Original (b) Input (c) a— exp (d) TRWS (e) Co-occ (f) Our 

Figure 10: Comparison of all the methods for the image inpainting and denoising problem of the 'penguin'. Notice that our 
method recovers the hand of the penguin very smoothly. In other methods, except Co-oc, the ground is over-smooth while 
our method recovers the ground quite well compared to others. 



(a) Original (b) Input (c) a— exp (d) TRWS (e) Co-occ (f) Our 


Figure 11: Comparison of all the methods for the image inpainting and denoising problem of the 'house’. 


Notations. r(xc) denotes the set of unique labels present in the clique Xc. (5(r(xc)) and (5^*^(r(xc)) denotes the diversity 
and the diameter diversity of the unique labels present in the clique x^, respectively. A4 = maxc |xc| is the size of the largest 
maximal-clique and |>C| is the number of labels. 

B.l. Multiplicative Bound of the Hierarchical Move Making Algorithm for the Hierarchical Potts Model 
- Proof of Theorem- 1 

Proof. Let x* be the optimal labeling of the given hierarchical Potts model based labeling problem. Note that any node 
p in the underlying r-HST represents a cluster (subset) of labels. For each node p in the r-HST we define following sets using 

X*: 


= {li\ke/:,iep}, 

= {Xi : X* e 

IP = {c:XeC V^}, 

8^’ = {c:x,nV^ 

OP = {c:XcnVP = 0}. (18) 

In other words, £p is the set of labels in the cluster at node, is the set of nodes whose optimal label lies in the subtree 
rooted at p, is the set of cliques such that the optimal labeling lies in the subtree rooted at p, PP is the set of cliques 
(boundary cliques) such that Vxc G Xj} G x^ : x* G ^ and is the set of outside cliques such that 

the optimal assignment for all the nodes belongs to the set C\C^. Let’s define yp as the labeling at node p. We prove the 
following lemma relating x* and x^. 

Lemma 1. Let x^ be the labeling at node p, x* be the optimal labeling of the given hierarchical P^ Potts model, and 
(5^*^(r(x^)) be the diameter diversity based clique potential defined as d^{li, lj)yp, where d ^{.,.) is the tree 

metric defined over the given r-HST, then the following bound holds true at any node p of the r-HST. 

^ min(Af, |£|) <5'^-(r(x:)) (19) 

cexp ^ ^ cexp 
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Proof. We prove the above lemma by mathematical induction. Clearly, when p is a leaf node, Xi = p^i G V. For a non-leaf 
node p, we assume that the lemma holds true for the labeling of all its children q. Given the labeling and x*^, we define 
a new labeling x^*^ such that 


y.pq = [ ^ 

\ otherwise. 

Note that lies within one a-expansion iteration away from x^. Since 'xP is the local minima, we can say that 

E{xP\IP) + E{xP\BP) + E{xP\OP) < E{xP'i\IP‘i) + E{xP'i\BP‘i) + E{xP^\OP^) 

E{xP\IP) + E{xP\BP) < £’(xP«|2:p«) + 

<5‘^“(r(x?)) + <5(r(x?)) < ^ <5‘^“(r(x?«)) + ^ 5‘'“(r(x?‘^)) 

cexp ceBp cexp^ ceBp^ 

<5"“(r(x?)) + ^ <5''“(r(x?)) < ,5'^-(r(xf)) + ^ <5'^“(r(x?«)) 

CGX? ceB^ CGX? ceB^ 

Using the mathematical induction we can write 

^ <5‘^“(r(x?)) + ^ <5‘^“(r(x?)) < min(Af, |£|) -^"“(r(x:)) + ,5‘'-(r(x?‘^)) 

cei« c6e« ^ cei3 ceBi 


( 20 ) 


( 21 ) 

( 22 ) 

(23) 


(24) 


Now consider a clique c G B'^. Let be the length of edges from node p to its children q. Since c G there must exist 

atleast two nodes Xi and xj in Xc such that x* G and x* ^ CP, therefore, by construction of r-HST 


(5‘^“(r(x*)) > 2eP 

Furthermore, by the construction of xP^, C £p, therefore, in worst case (leaf nodes), we can write 
(5*“(r(x?«))=maa;,„;,g£.,d*(£,Z,) < 2e*> (^1 + ^ + ^ + • • • ^ 


(25) 


= 2eP 


r — 1 


< <5‘'“(r(x:)) 


r — 1 


(26) 


From inequalities (24) and (26) 

^ (5''“(r(x?)) + ^ (5''“(r(x?)) < min(Af, |£|) ^ <5'^“(r(x:)) + 


cGX^ 


ceB^ 


cex^ 


- j 5''“(r(x:)) 


ceB^ 


(27) 


In order to get the bound over the total energy we sum over all the children q of p, denoted as r]{p). Therefore, summing the 
inequality (27) over r]{p) we get 


q£ri{p) c£X^ q^vip) c£B^ 


+ 



(28) 
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The LHS of the above inequality can be written as 

^ ,5‘'“(r(x?)) + ^ ,5‘'“(r(x?)) > E <^"“(r(xD)+ E <^‘'“(r(x?)) 

qer](p) ceX^ qer](p) ceB^ ceLiq^riip)^"^ ceUq^r}(p)J3^ 

= E <^‘^“(r(x?)) (29) 

cexp 

The above inequality and equality is due to the fact that = 0, is not necessarily an empty set, 

(5^*^(r(xc)) > 0, and = {Ug^^(p)X^} U {Ug^^(p)S^}. Now let us have a look into the second term of the RHS of 
the inequality (28) 


E E'^'“(r«)) < E 

q£p{p) ceB^ 


iin(|r/(p)Uxe|)(5"^"(r(x:)) 


(30) 


ceUger,(p)^‘^ 


< mini max |7^(g)|, max |xc 

\pep{p) c 


E <^‘'“(r(x:)) 




= min(£,|M|) E <^"“(r(x:)) 


(31) 


ceu^g^(p)239 


The inequality (30) is due to the fact that can not count a clique more than min(|? 7 (p)|, |xc|) times. Therefore, 

using the inequality (31) in the RHS of the inequality (29) we get 


min(M,|£|)(—j E E'^''“(r(x:))+( —j E E'^''“(r(x:)) 

^ qEp{p) cGX9 q^p{p) cEB^ 


< min(x,|£|)( — 1 I E <5"“(r(x:))+ E <^"“(r(x:)) 

\ceu,e^(p)I’ c6U,<=„(p)B<i 


= niin(M,|£|) E <5‘'“(r(x:)) 

^ ^ ceXp 


Finally, using inequalities (28), (29) and (32) we get 


E <^''“(r(x?)) < min(X, |£|) E <^‘'“(r(x:)) 

ceXp ^ ^ ceXp 


(32) 


(33) 

□ 

□ 


Applying the above lemma to the root node proves the theorem. 

B.2. Multiplicative Bound of the Algorithm-2 for the Parsimonious Labeling - Proof of Theorem-2 

Proof. Let us say that (i(.,.) is the induced metric of the given diversity {6, C) and be it’s diameter diversity. We first 
approximate d(.,.) as a mixture of r-HST metrics cf Using Theorem-3 we get the following relationship 


£(.,.) < 0{log\C\)d\.,.) 

For a given clique Xc, using Proposition- 1, we get the following relationship 

<5‘^“(r(xc)) < <5(r(xc)) < (|r(xe)| - i)<5'^“(r(x,)) 

Therefore, using equations (35) and (34), we get the following inequality 

(5‘^“(r(xc)) < (5(r(xc)) < (|r(x,)|-i)(5*“(r(x,)) 

< 0(log |r(xe)|)(|r(xe)| - i)<5f“(r(xp)) 


(34) 


(35) 


(36) 
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where, ^J^*"(r(xc)) is the diameter diversity defined over the tree metric d ^{.,.) which is obtained using the randomized 
algorithm [9] on the induced metric d(.,.). 

Hence, combing the inequality (36) and the previously proved Theorem- 1 proves the Theorem-2. 

Notice that, in case our diversity in itself is a diameter diversity, we don’t need the inequality (35), therefore, the multi¬ 
plicative bound reduces to ^ ^ (log |£ |) , ||). □ 

Theorem 3. Given any distance metric function d{.,.) defined over a set of labels C, the randomized algorithm given in [9] 
produces a mixture ofr-UST tree metrics d^{., .) such that d {.,.) < 0(log |£|)(i^(.,.). 

Proof: Please see the reference [9]. 

Proposition 1. Let (£, 5) be a diversity with induced metric space (£, d), then the following inequality holds VT C C. 

^dia(r) < < (|r| - i)(5^^^(r) (3?) 


Proof: Please see the reference [4]. 
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